keywords:"AVX" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"AVX"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Assisted Code Vectorization and Parallelization Using the OpenMP 4.0 Standard Slouka, Lukáš ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) The subject of the bachelor's thesis is code optimization using the OpenMP 4.0 standard which provides tools for assisted parallelization and vectorization. In addition to the descrip tion of the OpenMP 4.0 standard, the thesis as well contains an insight into architectures of modern computers, specifically the system of cache memories and SSE/AVX modules that play a major role in the optimization field. The thesis demonstrates advantages of optimized code compared to unoptimized version on a set of benchmarks which are aimed at various aspects of optimization. Detailed record
	Processing units of last generation processors and their utilization Šlenker, Samuel ; Pavlíček, Tomáš (referee) ; Balík, Miroslav (advisor) The aim of this thesis was to study and subsequently process the differences between the older instruction sets and newer instruction sets, to specify the benefits of the individual extensions, to compare the way of computations of the individual SIMD processing units and to compare the implementation of these processing units in Intel and AMD companies. Part of this work are two theoretical introductions to laboratory tasks. Detailed record
	Parallelization of Ultrasound Simulations Using 2D Decomposition Nikl, Vojtěch ; Dvořák, Václav (referee) ; Jaroš, Jiří (advisor) This thesis is a part of the k-Wave project, which is a toolbox for the simulation and reconstruction of acoustic wave felds and one of its main contributions is the planning of focused ultrasound surgeries (HIFU). One simulation can take tens of hours and about 60% of the simulation time is taken by the calculation of the 3D Fast Fourier transforms. Up until now the 3D FFT has been calculated purely by the FFTW library and its 1D decomposition, whose major limitation is the maximum number of employable cores. Therefore we introduce a new approach, called the 2D hybrid decomposition of the 3D FFT (HybridFFT), where we combine both MPI processes and OpenMP threads to reach as best performance as possible. On a low number of cores, on the order of a few hundreds, we are about as fast or slightly faster than FFTW and pure MPI 2D decomposition libraries (PFFT and P3DFFT). One of the best results was achieved on a 512^3FFT using 512 cores, where our hybrid version run 31ms, FFTW run 39ms and PFFT run 44ms. The most significant performance advantage should be seen when employing around 8-16 thousand cores, however we haven't had an access to a machine with such resources. Almost a linear scalability has been proven for up to 2048 employed cores. Detailed record
	Acceleration of Vector and Cryptographic Operations on x86-64 Platform Šlenker, Samuel ; Martinásek, Zdeněk (referee) ; Balík, Miroslav (advisor) The aim of this thesis was to study and subsequently process a comparison of older and newer SIMD processing units of modern microprocessors on the x86-64 platform. The thesis provides an overview of the fastest computations of vector operations with matrices and vectors, including corresponding source codes. Furthermore, the thesis is focused on authenticated encryption, specifically on block cipher AES operating in Galois Counter Mode, and on a discussion of possibilities of instruction sets for cryptographic support. Detailed record
	Parallelization of Ultrasound Simulations Using 2D Decomposition Nikl, Vojtěch ; Dvořák, Václav (referee) ; Jaroš, Jiří (advisor) This thesis is a part of the k-Wave project, which is a toolbox for the simulation and reconstruction of acoustic wave felds and one of its main contributions is the planning of focused ultrasound surgeries (HIFU). One simulation can take tens of hours and about 60% of the simulation time is taken by the calculation of the 3D Fast Fourier transforms. Up until now the 3D FFT has been calculated purely by the FFTW library and its 1D decomposition, whose major limitation is the maximum number of employable cores. Therefore we introduce a new approach, called the 2D hybrid decomposition of the 3D FFT (HybridFFT), where we combine both MPI processes and OpenMP threads to reach as best performance as possible. On a low number of cores, on the order of a few hundreds, we are about as fast or slightly faster than FFTW and pure MPI 2D decomposition libraries (PFFT and P3DFFT). One of the best results was achieved on a 512^3FFT using 512 cores, where our hybrid version run 31ms, FFTW run 39ms and PFFT run 44ms. The most significant performance advantage should be seen when employing around 8-16 thousand cores, however we haven't had an access to a machine with such resources. Almost a linear scalability has been proven for up to 2048 employed cores. Detailed record
	Acceleration of Vector and Cryptographic Operations on x86-64 Platform Šlenker, Samuel ; Martinásek, Zdeněk (referee) ; Balík, Miroslav (advisor) The aim of this thesis was to study and subsequently process a comparison of older and newer SIMD processing units of modern microprocessors on the x86-64 platform. The thesis provides an overview of the fastest computations of vector operations with matrices and vectors, including corresponding source codes. Furthermore, the thesis is focused on authenticated encryption, specifically on block cipher AES operating in Galois Counter Mode, and on a discussion of possibilities of instruction sets for cryptographic support. Detailed record
	Assisted Code Vectorization and Parallelization Using the OpenMP 4.0 Standard Slouka, Lukáš ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) The subject of the bachelor's thesis is code optimization using the OpenMP 4.0 standard which provides tools for assisted parallelization and vectorization. In addition to the descrip tion of the OpenMP 4.0 standard, the thesis as well contains an insight into architectures of modern computers, specifically the system of cache memories and SSE/AVX modules that play a major role in the optimization field. The thesis demonstrates advantages of optimized code compared to unoptimized version on a set of benchmarks which are aimed at various aspects of optimization. Detailed record
	Processing units of last generation processors and their utilization Šlenker, Samuel ; Pavlíček, Tomáš (referee) ; Balík, Miroslav (advisor) The aim of this thesis was to study and subsequently process the differences between the older instruction sets and newer instruction sets, to specify the benefits of the individual extensions, to compare the way of computations of the individual SIMD processing units and to compare the implementation of these processing units in Intel and AMD companies. Part of this work are two theoretical introductions to laboratory tasks. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English